A Parallel Multikey Quicksort Algorithm for Mining Multiword Units
ثبت نشده
چکیده
In the context of word associations, multiword units (sequences of words that co-occur more often than expected by chance) are frequently used in everyday language, usually to precisely express ideas and concepts that cannot be compressed into a single word. For instance, [Bill of Rights], [swimming pool], [as well as], [in order to], [to comply with] or [to put forward] are multiword units. As a consequence, their identification is a crucial issue for applications that require a certain degree of semantic processing (e.g. machine translation, information extraction, information retrieval or summarization). In order to identify and extract multiword units, [Anonymous, 2002] has proposed a statistically-based architecture called SENTA (Software for the Extraction of N-ary Textual Associations) that retrieves, from text corpora, relevant contiguous and non-contiguous sequences of words.
منابع مشابه
Parallel String Sample Sort
We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we propose string sample sort. The algorithm makes effective use of the memory hierarchy, uses additional word level parallelism, and largely avoids branch mispredi...
متن کاملFast Construction of ZDDs from Large-scale Hypergraphs
(Abstract) We present an algorithm to compress hypergraphs into the data structure ZDDs and analyze the computational complexity. Since a ZDD provides an approach to solve large-scale problems that are difficult to compute in a reasonable amount of time and space, it is important to compress hypergraphs efficiently. Our algorithm uses multikey Quicksort given by Bentley and Sedgewick. By conduc...
متن کاملUsing LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units
The availability of contiguous and non-contiguous multiword lexical units (MWUs) in Natural Language Processing (NLP) lexica enhances parsing precision, helps attachment decisions, improves indexing in information retrieval (IR) systems, reinforces information extraction (IE) and text mining, among other applications. Unfortunately, their acquisition has long been a significant problem in NLP, ...
متن کاملMultilingual Aspects of Multiword Lexical Units
As most of the machine-readable dictionaries contain clearly insufficient information about multiword lexical units, there is a constant need to extend and tune specialized lexical databases to account for new expressions. In this paper, we present a system exclusively based on statistics that massively extracts from unrestricted text corpora contiguous and noncontiguous rigid multiword lexical...
متن کاملOn multiword lexical units and their role in maritime dictionaries
Multi-word lexical units are a typical feature of specialized dictionaries, in particular monolingual and bilingual maritime dictionaries. The paper studies the concept of the multi-word lexical unit and considers the similarities and differences of their selection and presentation in monolingual and bilingual maritime dictionaries. The work analyses such issues as the classification of multi-w...
متن کامل